Searching for a Measure of Word Order Freedom
نویسندگان
چکیده
This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance, entropy and cosine similarity. The differences among the measures are discussed.
منابع مشابه
On Formalization of Word Order Properties
This paper contains an attempt to formalize the degree of word order freedom for natural languages. It exploits the mechanism of the analysis by reduction and defines a measure based on a number of shifts performed in the course of the analysis. This measure helps to understand the difference between the word order complexity (how difficult it is to parse sentences with more complex word order)...
متن کاملExamining the Relationship between Preordering and Word Order Freedom in Machine Translation
We study the relationship between word order freedom and preordering in statistical machine translation. To assess word order freedom, we first introduce a novel entropy measure which quantifies how difficult it is to predict word order given a source sentence and its syntactic analysis. We then address preordering for two target languages at the far ends of the word order freedom spectrum, Ger...
متن کاملQuantifying Word Order Freedom in Dependency Corpora
Using recently available dependency corpora, we present novel measures of a key quantitative property of language, word order freedom: the extent to which word order in a sentence is free to vary while conveying the same meaning. We discuss two topics. First, we discuss linguistic and statistical issues associated with our measures and with the annotation styles of available corpora. We find th...
متن کاملDiachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek
One easily observable aspect of language variation is the order of words. In human and machine natural language processing, it is often claimed that parsing freeorder languages is more difficult than parsing fixed-order languages. In this study on Latin and Ancient Greek, two wellknown and well-documented free-order languages, we propose syntactic correlates of word order freedom. We apply our ...
متن کاملAn Intensity Measure for Seismic Input Energy Demand of Multi-Degree-of-Freedom Systems
Nonlinear dynamic analyses are performed to compute the maximum relative input energy per unit mass for 21 multi-degree-of-freedom systems (MDOF) with preselected target fundamental periods of vibration ranging from 0.2 to 4.0 s and 6 target inter-story ductility demands of 1, 2, 3, 4, 6, 8 subjected to 40 the earthquake ground motions. The efficiency of the several intensity measures as an ind...
متن کامل